Missing Word Counts

نویسندگان

  • Sunil Abraham
  • Greg Brockman
  • Anant P. Godbole
چکیده

The English translation of Leo Tolstoy’s novel War and Peace has the following notable property: it contains this paragraph as a subsequence. If one were to write the letters and spaces that appear in the book as a string, then there would be a subsequence of the string that is identical to the string of letters and spaces in this paragraph. The full property is more general than that – War and Peace contains as a subsequence any possible string of up to 950 letters and spaces (the TEX code for this paragraph has 866 characters). That includes valid English text such as the first 950 characters of President Obama’s Inaugural Address, as well as a string of 950 “q”s. War and Peace is thus a tome that is 950-omnibus (or 950-omni) over the 27 character alphabet {a, b, c, . . . , z, SPACE}. Of course, such a text is not at all hard to create by design. Consider writing the string “abcd . . . xyz ” 950 times. Clearly one could then find as

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two modes of assessment: the case of academicians' writing

This study attempted to investigate writing problems and the relationship between expert-assessment and self-assessment of writing problems. Participants were thirty four non-English faculty members of Tehran and Guilan universities. The instruments were writing an essay on the topic "What teaching strategies do you use in your classes?" in twenty five lines and filling the questionnaire of wri...

متن کامل

Smoothing a Tera-word Language Model

Frequency counts from very large corpora, such as the Web 1T dataset, have recently become available for language modeling. Omission of low frequency n-gram counts is a practical necessity for datasets of this size. Naive implementations of standard smoothing methods do not realize the full potential of such large datasets with missing counts. In this paper I present a new smoothing algorithm t...

متن کامل

The Effect of Word Meaning on Speech DysFluency in Adults with Developmental Stuttering

Objectives: Stuttering is one of the most prevalent speech and language disorders. Symptomology of stuttering has been surveyed from different aspects such as biological, developmental, environmental, emotional, learning and linguistic. Previous researches in English-speaking people have suggested that some linguistic features such as word meanings may play a role in the frequency of speech non...

متن کامل

Efficient fusion of aggregated historical data

Background. In this paper, we address the challenge of recovering a time sequence of counts from aggregated historical data. For example, given a mixture of the monthly and weekly sums, how can we find the daily counts of people infected with flu? In general, what is the best way to recover historical counts from aggregated, possibly overlapping historical reports, in the presence of missing va...

متن کامل

Estimation of missing traffic counts using factor, genetic, neural, and regression techniques

Analyses from some of the highway agencies show that up to 50% permanent traffic counts (PTCs) have missing values. It will be difficult to eliminate such a significant portion of data from traffic analysis. Literature review indicates that the limited research uses factor or autoregressive integrated moving average (ARIMA) models for predicting missing values. Factor-based models tend to be le...

متن کامل

Data augmentation and language model adaptation

A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram Language Model (LM). This method is based on numerical distances in a reduced space obtained by Singular Value Decomposition (SVD). Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a Word Error Rate (WER) reduction of 6.5%. By further interpol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009